{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Koda Retriever: Quickstart\n",
    "\n",
    "*For this example non-production ready infrastructure is leveraged, and default categories (and corresponding alpha values) are used.*\n",
    "\n",
    "More specifically, the default sample data provided in a free start instance of [Pinecone](https://www.pinecone.io/) is used. This data consists of movie scripts and their summaries embedded in a free Pinecone vector database.\n",
    "\n",
    "### Agenda:\n",
    "- Setup\n",
    "- Koda Retriever: Retrieval \n",
    "- Koda Retriever: Query Engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import all the necessary modules\n",
    "from llama_index.llms.openai import OpenAI\n",
    "from llama_index.core import VectorStoreIndex\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "from llama_index.core.postprocessor import LLMRerank\n",
    "from llama_index.core import VectorStoreIndex\n",
    "from llama_index.vector_stores.pinecone import PineconeVectorStore\n",
    "from llama_index.core import Settings\n",
    "from llama_index.core.query_engine import RetrieverQueryEngine\n",
    "from llama_index.packs.koda_retriever import KodaRetriever\n",
    "import os\n",
    "from pinecone import Pinecone"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "Building *required objects* for a Koda Retriever.\n",
    "- Vector Index\n",
    "- LLM/Model\n",
    "\n",
    "Other objects are *optional*, and will be used if provided:\n",
    "- Reranker\n",
    "- Custom categories & corresponding alpha weights\n",
    "- A custom model trained on the custom info above"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pc = Pinecone(api_key=os.environ.get(\"PINECONE_API_KEY\"))\n",
    "index = pc.Index(\"sample-movies\")\n",
    "\n",
    "Settings.llm = OpenAI()\n",
    "Settings.embed_model = OpenAIEmbedding()\n",
    "\n",
    "vector_store = PineconeVectorStore(pinecone_index=index, text_key=\"summary\")\n",
    "vector_index = VectorStoreIndex.from_vector_store(\n",
    "    vector_store=vector_store, embed_model=Settings.embed_model\n",
    ")\n",
    "\n",
    "reranker = LLMRerank(llm=Settings.llm)  # optional"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building Koda Retriever"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = KodaRetriever(\n",
    "    index=vector_index,\n",
    "    llm=Settings.llm,\n",
    "    reranker=reranker,  # optional\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Retrieving w/ Koda Retriever"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NodeWithScore(node=TextNode(id_='7', embedding=None, metadata={'box-office': 1671537444.0, 'title': 'Jurassic World', 'year': 2015.0}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='Set in a fully functioning dinosaur theme park on Isla Nublar, Jurassic World faces chaos when a genetically engineered dinosaur, the Indominus Rex, escapes. Stars Chris Pratt and Bryce Dallas Howard.', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=8.0)]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"How many Jurassic Park movies are there?\"\n",
    "results = retriever.retrieve(query)\n",
    "\n",
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Those results don't look quite palletteable though. For that, lets look into making the response more *natural*. For that we'll likely need a Query Engine."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Query Engine w/ Koda Retriever\n",
    "\n",
    "Query Engines are [Llama Index abstractions](https://docs.llamaindex.ai/en/stable/module_guides/deploying/query_engine/root.html) that combine retrieval and synthesization of an LLM to interpret the results given by a retriever into a natural language response to the original query. They are themselves an end-to-end pipeline from query to natural langauge response."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'There are five Jurassic Park movies.'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query_engine = RetrieverQueryEngine.from_args(retriever=retriever)\n",
    "\n",
    "response = query_engine.query(query)\n",
    "\n",
    "str(response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
